Tutorial Brief

We will cover some functions from YouTube Data API v3 from Google Developer Console.

Important Links:

We will use the following function:

youtube.search.list Doc
youtube.videos.list Doc

Video Tutorial:

https://www.youtube.com/watch?v=bCrCkfSyuNE

Google APIs

There is a Python Google Library. But we will be using HTTP requests to access the API.



In [73]:

    
api_key = ""

Import Libraries



In [57]:

    
from __future__ import division
from datetime import datetime 
import requests
from lxml import html, etree
import json
from textblob import TextBlob

import pandas as pd

import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

pd.options.display.max_columns = 100
pd.options.display.max_rows = 35
pd.options.display.width = 120

Searching YouTube Using `youtube.search.list`

Documentation:

https://developers.google.com/youtube/v3/docs/search

HTTPS Request:

GET https://www.googleapis.com/youtube/v3/search

Parameters:

Parameter name	Value	Description
Required parameters
`part`	`string`	The `part` parameter specifies a comma-separated list of one or more `search` resource properties that the API response will include. Set the parameter value to `snippet`. The `snippet` part has a quota cost of 1 unit.
Filters (specify 0 or 1 of the following parameters)
`forContentOwner`	`boolean`	This parameter can only be used in a properly authorized request. Note: This parameter is intended exclusively for YouTube content partners. The `forContentOwner` parameter restricts the search to only retrieve resources owned by the content owner specified by the `onBehalfOfContentOwner` parameter. The user must be authenticated using a CMS account linked to the specified content owner and `onBehalfOfContentOwner` must be provided.
`forMine`	`boolean`	This parameter can only be used in a properly authorized request. The `forMine` parameter restricts the search to only retrieve videos owned by the authenticated user. If you set this parameter to `true`, then the `type` parameter's value must also be set to `video`.
`relatedToVideoId`	`string`	The `relatedToVideoId` parameter retrieves a list of videos that are related to the video that the parameter value identifies. The parameter value must be set to a YouTube video ID and, if you are using this parameter, the `type` parameter must be set to `video`.
Optional parameters
`channelId`	`string`	The `channelId` parameter indicates that the API response should only contain resources created by the channel
`channelType`	`string`	The `channelType` parameter lets you restrict a search to a particular type of channel. Acceptable values are: `any` – Return all channels. `show` – Only retrieve shows.
`eventType`	`string`	The `eventType` parameter restricts a search to broadcast events. If you specify a value for this parameter, you must also set the `type` parameter's value to `video`. Acceptable values are: `completed` – Only include completed broadcasts. `live` – Only include active broadcasts. `upcoming` – Only include upcoming broadcasts.
`location`	`string`	The `location` parameter, in conjunction with the `locationRadius` parameter, defines a circular geographic area and also restricts a search to videos that specify, in their metadata, a geographic location that falls within that area. The parameter value is a string that specifies latitude/longitude coordinates e.g. (`37.42307,-122.08427`). The `location` parameter value identifies the point at the center of the area. The `locationRadius` parameter specifies the maximum distance that the location associated with a video can be from that point for the video to still be included in the search results. The API returns an error if your request specifies a value for the `location` parameter but does not also specify a value for the `locationRadius` parameter.
`locationRadius`	`string`	The `locationRadius` parameter, in conjunction with the `location` parameter, defines a circular geographic area. The parameter value must be a floating point number followed by a measurement unit. Valid measurement units are `m`, `km`, `ft`, and `mi`. For example, valid parameter values include `1500m`, `5km`, `10000ft`, and `0.75mi`. The API does not support `locationRadius` parameter values larger than 1000 kilometers. Note: See the definition of the `location` parameter for more information.
`maxResults`	`unsigned integer`	The `maxResults` parameter specifies the maximum number of items that should be returned in the result set. Acceptable values are `0` to `50`, inclusive. The default value is `5`.
`onBehalfOfContentOwner`	`string`	This parameter can only be used in a properly authorized request. Note: This parameter is intended exclusively for YouTube content partners. The `onBehalfOfContentOwner` parameter indicates that the request's authorization credentials identify a YouTube CMS user who is acting on behalf of the content owner specified in the parameter value. This parameter is intended for YouTube content partners that own and manage many different YouTube channels. It allows content owners to authenticate once and get access to all their video and channel data, without having to provide authentication credentials for each individual channel. The CMS account that the user authenticates with must be linked to the specified YouTube content owner.
`order`	`string`	The `order` parameter specifies the method that will be used to order resources in the API response. The default value is `relevance`. Acceptable values are: `date` – Resources are sorted in reverse chronological order based on the date they were created. `rating` – Resources are sorted from highest to lowest rating. `relevance` – Resources are sorted based on their relevance to the search query. This is the default value for this parameter. `title` – Resources are sorted alphabetically by title. `videoCount` – Channels are sorted in descending order of their number of uploaded videos. `viewCount` – Resources are sorted from highest to lowest number of views.
`pageToken`	`string`	The `pageToken` parameter identifies a specific page in the result set that should be returned. In an API response, the `nextPageToken` and `prevPageToken` properties identify other pages that could be retrieved.
`publishedAfter`	`datetime`	The `publishedAfter` parameter indicates that the API response should only contain resources created after the specified time. The value is an RFC 3339 formatted date-time value (1970-01-01T00:00:00Z).
`publishedBefore`	`datetime`	The `publishedBefore` parameter indicates that the API response should only contain resources created before the specified time. The value is an RFC 3339 formatted date-time value (1970-01-01T00:00:00Z).
`q`	`string`	The `q` parameter specifies the query term to search for. Your request can also use the Boolean NOT (`-`) and OR (`\|`) operators to exclude videos or to find videos that are associated with one of several search terms. For example, to search for videos matching either "boating" or "sailing", set the `q` parameter value to `boating\|sailing`. Similarly, to search for videos matching either "boating" or "sailing" but not "fishing", set the `q` parameter value to `boating\|sailing -fishing`. Note that the pipe character must be URL-escaped when it is sent in your API request. The URL-escaped value for the pipe character is `%7C`.
`regionCode`	`string`	The `regionCode` parameter instructs the API to return search results for the specified country. The parameter value is an ISO 3166-1 alpha-2 country code.
`safeSearch`	`string`	The `safeSearch` parameter indicates whether the search results should include restricted content as well as standard content. Acceptable values are: `moderate` – YouTube will filter some content from search results and, at the least, will filter content that is restricted in your locale. Based on their content, search results could be removed from search results or demoted in search results. This is the default parameter value. `none` – YouTube will not filter the search result set. `strict` – YouTube will try to exclude all restricted content from the search result set. Based on their content, search results could be removed from search results or demoted in search results.
`topicId`	`string`	The `topicId` parameter indicates that the API response should only contain resources associated with the specified topic. The value identifies a Freebase topic ID.
`type`	`string`	The `type` parameter restricts a search query to only retrieve a particular type of resource. The value is a comma-separated list of resource types. The default value is `video,channel,playlist`. Acceptable values are: `channel` `playlist` `video`
`videoCaption`	`string`	The `videoCaption` parameter indicates whether the API should filter video search results based on whether they have captions. If you specify a value for this parameter, you must also set the `type` parameter's value to `video`. Acceptable values are: `any` – Do not filter results based on caption availability. `closedCaption` – Only include videos that have captions. `none` – Only include videos that do not have captions.
`videoCategoryId`	`string`	The `videoCategoryId` parameter filters video search results based on their category. If you specify a value for this parameter, you must also set the `type` parameter's value to `video`.
`videoDefinition`	`string`	The `videoDefinition` parameter lets you restrict a search to only include either high definition (HD) or standard definition (SD) videos. HD videos are available for playback in at least 720p, though higher resolutions, like 1080p, might also be available. If you specify a value for this parameter, you must also set the `type` parameter's value to `video`. Acceptable values are: `any` – Return all videos, regardless of their resolution. `high` – Only retrieve HD videos. `standard` – Only retrieve videos in standard definition.
`videoDimension`	`string`	The `videoDimension` parameter lets you restrict a search to only retrieve 2D or 3D videos. If you specify a value for this parameter, you must also set the `type` parameter's value to `video`. Acceptable values are: `2d` – Restrict search results to exclude 3D videos. `3d` – Restrict search results to only include 3D videos. `any` – Include both 3D and non-3D videos in returned results. This is the default value.
`videoDuration`	`string`	The `videoDuration` parameter filters video search results based on their duration. If you specify a value for this parameter, you must also set the `type` parameter's value to `video`. Acceptable values are: `any` – Do not filter video search results based on their duration. This is the default value. `long` – Only include videos longer than 20 minutes. `medium` – Only include videos that are between four and 20 minutes long (inclusive). `short` – Only include videos that are less than four minutes long.
`videoEmbeddable`	`string`	The `videoEmbeddable` parameter lets you to restrict a search to only videos that can be embedded into a webpage. If you specify a value for this parameter, you must also set the `type` parameter's value to `video`. Acceptable values are: `any` – Return all videos, embeddable or not. `true` – Only retrieve embeddable videos.
`videoLicense`	`string`	The `videoLicense` parameter filters search results to only include videos with a particular license. YouTube lets video uploaders choose to attach either the Creative Commons license or the standard YouTube license to each of their videos. If you specify a value for this parameter, you must also set the `type` parameter's value to `video`. Acceptable values are: `any` – Return all videos, regardless of which license they have, that match the query parameters. `creativeCommon` – Only return videos that have a Creative Commons license. Users can reuse videos with this license in other videos that they create. Learn more. `youtube` – Only return videos that have the standard YouTube license.
`videoSyndicated`	`string`	The `videoSyndicated` parameter lets you to restrict a search to only videos that can be played outside youtube.com. If you specify a value for this parameter, you must also set the `type` parameter's value to `video`. Acceptable values are: `any` – Return all videos, syndicated or not. `true` – Only retrieve syndicated videos.
`videoType`	`string`	The `videoType` parameter lets you restrict a search to a particular type of videos. If you specify a value for this parameter, you must also set the `type` parameter's value to `video`. Acceptable values are: `any` – Return all videos. `episode` – Only retrieve episodes of shows. `movie` – Only retrieve movies.

The important parameters:

part:
- id: Returns only resource ID data
- snippet: Returns some basic meta data about the resource
channelId:
- Filter results to a single channelId.
maxResults:
- Between 0 and 50 results per page. The default is 5.
order:
- date: Resources are sorted in reverse chronological order based on the date they were uploaded.
- rating: Resources are sorted from highest to lowest rating.
- relevance: Resources are sorted based on their relevance to the search query. This is the default value for this parameter.
- title: Resources are sorted alphabetically by title.
- videoCount: Channels are sorted in descending order of their number of uploaded videos.
- viewCount: Resources are sorted from highest to lowest number of views.
pageToken:
- A string token to select results page
publishedAfter:
- Use RFC 3339 format for Date Time 2000-12-31T23:59:59
publishedBefore:
- Use RFC 3339 format for Date Time 2000-12-31T23:59:59
q:
- Query term(s)
- You can use multiple search terms
- For OR operator use |
- For NOT operator use -
key:
- You API Key code

Preparing The HTTP Request



In [3]:

    
parameters = {"part": "snippet",
              "maxResults": 5,
              "order": "date",
              "pageToken": "",
              "publishedAfter": "2008-08-04T00:00:00Z",
              "publishedBefore": "2008-11-04T00:00:00Z",
              "q": "",
              "key": api_key,
              "type": "video",
              }
url = "https://www.googleapis.com/youtube/v3/search"

Fetch Results for a Single Page



In [4]:

    
parameters["q"] = "Mark Udall"
page = requests.request(method="get", url=url, params=parameters)
j_results = json.loads(page.text)
print page.text









    



{
 "kind": "youtube#searchListResponse",
 "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/_2hFMhP6zvFl7CAy5D9Ir40dMWE\"",
 "nextPageToken": "CAUQAA",
 "pageInfo": {
  "totalResults": 2325,
  "resultsPerPage": 5
 },
 "items": [
  {
   "kind": "youtube#searchResult",
   "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/tmAMwya2pvXlrX05odd04vzKBSQ\"",
   "id": {
    "kind": "youtube#video",
    "videoId": "5Q98TvXjIZg"
   },
   "snippet": {
    "publishedAt": "2008-11-03T15:31:30.000Z",
    "channelId": "UC52X5wxOL_s5yw0dQk7NtgA",
    "title": "Cousins Vying to Ride Democratic Wave to Senate",
    "description": "Cousins Tom and Mark Udall are vying to become U.S. Senators in New Mexico and Colorado. The two are hoping to ride an emerging Democratic wave in the ...",
    "thumbnails": {
     "default": {
      "url": "https://i.ytimg.com/vi/5Q98TvXjIZg/default.jpg"
     },
     "medium": {
      "url": "https://i.ytimg.com/vi/5Q98TvXjIZg/mqdefault.jpg"
     },
     "high": {
      "url": "https://i.ytimg.com/vi/5Q98TvXjIZg/hqdefault.jpg"
     }
    },
    "channelTitle": "AssociatedPress",
    "liveBroadcastContent": "none"
   }
  },
  {
   "kind": "youtube#searchResult",
   "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/Pqtrk7f6rZM5jwPLKtXoJ98nNtg\"",
   "id": {
    "kind": "youtube#video",
    "videoId": "nnghUTeSKW0"
   },
   "snippet": {
    "publishedAt": "2008-11-03T00:06:40.000Z",
    "channelId": "UC9ZGcEDoHfuY8lB5_SknuLA",
    "title": "mark udall",
    "description": "gov project.",
    "thumbnails": {
     "default": {
      "url": "https://i.ytimg.com/vi/nnghUTeSKW0/default.jpg"
     },
     "medium": {
      "url": "https://i.ytimg.com/vi/nnghUTeSKW0/mqdefault.jpg"
     },
     "high": {
      "url": "https://i.ytimg.com/vi/nnghUTeSKW0/hqdefault.jpg"
     }
    },
    "channelTitle": "g072091",
    "liveBroadcastContent": "none"
   }
  },
  {
   "kind": "youtube#searchResult",
   "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/ay0GP1CevugYOb4FvBtJzXG_A0c\"",
   "id": {
    "kind": "youtube#video",
    "videoId": "Pq-KnAMpDHs"
   },
   "snippet": {
    "publishedAt": "2008-11-01T00:55:26.000Z",
    "channelId": "UC5QhjJAjxtRvJ9ujFxNiJbA",
    "title": "Eden Lane One on One with Congressman Mark Udall",
    "description": "Senate candidate, Congressman Mark Udall spoke with me at a campaign event. Congressional candidate Jared Polis, and State Senate Candidate Joe ...",
    "thumbnails": {
     "default": {
      "url": "https://i.ytimg.com/vi/Pq-KnAMpDHs/default.jpg"
     },
     "medium": {
      "url": "https://i.ytimg.com/vi/Pq-KnAMpDHs/mqdefault.jpg"
     },
     "high": {
      "url": "https://i.ytimg.com/vi/Pq-KnAMpDHs/hqdefault.jpg"
     }
    },
    "channelTitle": "missedenlane",
    "liveBroadcastContent": "none"
   }
  },
  {
   "kind": "youtube#searchResult",
   "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/y4ELzTNng5HKENL9FoHgkDV304k\"",
   "id": {
    "kind": "youtube#video",
    "videoId": "aITDlrkKOoY"
   },
   "snippet": {
    "publishedAt": "2008-11-01T00:42:17.000Z",
    "channelId": "UCT3P1V7_N5HzV1vEZUSdNNQ",
    "title": "CO: AFGE, APWU, NALC, and NPMHU leaflet with Mark Udall",
    "description": "APWU, NALC, and NPMHU are out at the worksite when it matters most!",
    "thumbnails": {
     "default": {
      "url": "https://i.ytimg.com/vi/aITDlrkKOoY/default.jpg"
     },
     "medium": {
      "url": "https://i.ytimg.com/vi/aITDlrkKOoY/mqdefault.jpg"
     },
     "high": {
      "url": "https://i.ytimg.com/vi/aITDlrkKOoY/hqdefault.jpg"
     }
    },
    "channelTitle": "shubi10",
    "liveBroadcastContent": "none"
   }
  },
  {
   "kind": "youtube#searchResult",
   "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/JgS_14GwklWRyGGiyMW8NbsyiQA\"",
   "id": {
    "kind": "youtube#video",
    "videoId": "JAHI1pSiEPM"
   },
   "snippet": {
    "publishedAt": "2008-10-30T04:43:12.000Z",
    "channelId": "UCxdp8upAlGFfB4jjTH3wAHw",
    "title": "[SEN-CO] Udall: Reason",
    "description": "http://politicalrealm.blogspot.com A new campaign ad from Democrat Mark Udall.",
    "thumbnails": {
     "default": {
      "url": "https://i.ytimg.com/vi/JAHI1pSiEPM/default.jpg"
     },
     "medium": {
      "url": "https://i.ytimg.com/vi/JAHI1pSiEPM/mqdefault.jpg"
     },
     "high": {
      "url": "https://i.ytimg.com/vi/JAHI1pSiEPM/hqdefault.jpg"
     }
    },
    "channelTitle": "PoliticalRealm",
    "liveBroadcastContent": "none"
   }
  }
 ]
}

YouTube Video Meta Data Using youtube.video.list

Documentation:

https://developers.google.com/youtube/v3/docs/videos/list

HTTPS Request:

GET https://www.googleapis.com/youtube/v3/videos

Parameters:

Parameter name	Value	Description
Required parameters
`part`	`string`	The `part` parameter specifies a comma-separated list of one or more `video` resource properties that the API response will include. If the parameter identifies a property that contains child properties, the child properties will be included in the response. For example, in a `video` resource, the `snippet` property contains the `channelId`, `title`, `description`, `tags`, and `categoryId` properties. As such, if you set `part=snippet`, the API response will contain all of those properties. The list below contains the `part` names that you can include in the parameter value and the quota cost for each part: `contentDetails`: 2 `fileDetails`: 1 `id`: 0 `liveStreamingDetails`: 2 `player`: 0 `processingDetails`: 1 `recordingDetails`: 2 `snippet`: 2 `statistics`: 2 `status`: 2 `suggestions`: 1 `topicDetails`: 2
Filters (specify exactly one of the following parameters)
`chart`	`string`	The `chart` parameter identifies the chart that you want to retrieve. Acceptable values are: `mostPopular` – Return the most popular videos for the specified content region and video category.
`id`	`string`	The `id` parameter specifies a comma-separated list of the YouTube video ID(s) for the resource(s) that are being retrieved. In a `video` resource, the `id` property specifies the video's ID.
`myRating`	`string`	This parameter can only be used in a properly authorized request. Set this parameter's value to `like` or `dislike` to instruct the API to only return videos liked or disliked by the authenticated user. Acceptable values are: `dislike` – Returns only videos disliked by the authenticated user. `like` – Returns only video liked by the authenticated user.
Optional parameters
`maxResults`	`unsigned integer`	The `maxResults` parameter specifies the maximum number of items that should be returned in the result set. Note: This parameter is supported for use in conjunction with the `myRating` parameter, but it is not supported for use in conjunction with the `id` parameter. Acceptable values are `1` to `50`, inclusive. The default value is `5`.
`onBehalfOfContentOwner`	`string`	This parameter can only be used in a properly authorized request. Note: This parameter is intended exclusively for YouTube content partners. The `onBehalfOfContentOwner` parameter indicates that the request's authorization credentials identify a YouTube CMS user who is acting on behalf of the content owner specified in the parameter value. This parameter is intended for YouTube content partners that own and manage many different YouTube channels. It allows content owners to authenticate once and get access to all their video and channel data, without having to provide authentication credentials for each individual channel. The CMS account that the user authenticates with must be linked to the specified YouTube content owner.
`pageToken`	`string`	The `pageToken` parameter identifies a specific page in the result set that should be returned. In an API response, the `nextPageToken` and `prevPageToken` properties identify other pages that could be retrieved. Note: This parameter is supported for use in conjunction with the `myRating` parameter, but it is not supported for use in conjunction with the `id` parameter.
`regionCode`	`string`	The `regionCode` parameter instructs the API to select a video chart available in the specified region. This parameter can only be used in conjunction with the `chart` parameter. The parameter value is an ISO 3166-1 alpha-2 country code.
`videoCategoryId`	`string`	The `videoCategoryId` parameter identifies the video category for which the chart should be retrieved. This parameter can only be used in conjunction with the `chart` parameter. By default, charts are not restricted to a particular category. The default value is `0`.

Preparing The HTTP Request



In [5]:

    
parameters = {"part": "statistics",
              "id": "5Q98TvXjIZg",
              "key": api_key,
              }
url = "https://www.googleapis.com/youtube/v3/videos"



In [6]:

    
page = requests.request(method="get", url=url, params=parameters)
j_results = json.loads(page.text)
print page.text









    



{
 "kind": "youtube#videoListResponse",
 "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/RI2HLqoe4gS1QbNV867B5089lmY\"",
 "pageInfo": {
  "totalResults": 1,
  "resultsPerPage": 1
 },
 "items": [
  {
   "kind": "youtube#video",
   "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/HdPfiQBFpxUe-eEq-EYCkg3p4b8\"",
   "id": "5Q98TvXjIZg",
   "statistics": {
    "viewCount": "58",
    "likeCount": "0",
    "dislikeCount": "0",
    "favoriteCount": "0",
    "commentCount": "0"
   }
  }
 ]
}

Process Data Range

I'll check the coorelation between the results of 2008 Senate elections results and YouTube Stats.

Colorado Senate - Gardner vs. Udall Cory Gardner (R) Mark Udall (D)



In [7]:

    
def _search_list(q="", publishedAfter=None, publishedBefore=None, pageToken=""):
    parameters = {"part": "id",
                  "maxResults": 50,
                  "order": "viewCount",
                  "pageToken": pageToken,
                  "q": q,
                  "type": "video",
                  "key": api_key,
                  }
    url = "https://www.googleapis.com/youtube/v3/search"
    
    if publishedAfter: parameters["publishedAfter"] = publishedAfter
    if publishedBefore: parameters["publishedBefore"] = publishedBefore
    
    page = requests.request(method="get", url=url, params=parameters)
    return json.loads(page.text)

def search_list(q="", publishedAfter=None, publishedBefore=None, max_requests=10):
    more_results = True
    pageToken=""
    results = []
    
    for counter in range(max_requests):
        j_results = _search_list(q=q, publishedAfter=publishedAfter, publishedBefore=publishedBefore, pageToken=pageToken)
        items = j_results.get("items", None)
        if items:
            results += [item["id"]["videoId"] for item in j_results["items"]]
            if j_results.has_key("nextPageToken"):
                pageToken = j_results["nextPageToken"]
            else:
                return results
        else:
            return results
    return results

def _video_list(video_id_list):
    parameters = {"part": "statistics",
                  "id": ",".join(video_id_list),
                  "key": api_key,
                  "maxResults": 50
                  }
    url = "https://www.googleapis.com/youtube/v3/videos"
    page = requests.request(method="get", url=url, params=parameters)
    j_results = json.loads(page.text)
    df = pd.DataFrame([item["statistics"] for item in j_results["items"]], dtype=np.int64)
    df["video_id"] = [item["id"] for item in j_results["items"]]
    
    parameters["part"] = "snippet"
    page = requests.request(method="get", url=url, params=parameters)
    j_results = json.loads(page.text)
    df["publishedAt"] = [item["snippet"]["publishedAt"] for item in j_results["items"]]
    df["publishedAt"] = df["publishedAt"].apply(lambda x: datetime.strptime(x, "%Y-%m-%dT%H:%M:%S.000Z"))
    df["date"] = df["publishedAt"].apply(lambda x: x.date())
    df["week"] = df["date"].apply(lambda x: x.isocalendar()[1])
    df["channelId"] = [item["snippet"]["channelId"] for item in j_results["items"]]
    df["title"] = [item["snippet"]["title"] for item in j_results["items"]]
    df["description"] = [item["snippet"]["description"] for item in j_results["items"]]
    df["channelTitle"] = [item["snippet"]["channelTitle"] for item in j_results["items"]]
    df["categoryId"] = [item["snippet"]["categoryId"] for item in j_results["items"]]
    return df

def video_list(video_id_list):
    values = []
    for index, item in enumerate(video_id_list[::50]):
        t_index = index * 50
        values.append(_video_list(video_id_list[t_index:t_index+50]))
    return pd.concat(values)

Get Data for Two Candidates



In [8]:

    
def get_data(candidates, publishedAfter, publishedBefore):
    results_list = []
    for q in candidates:
        results = search_list(q=q,
                              publishedAfter=publishedAfter,
                              publishedBefore=publishedBefore,
                              max_requests=50)

        stat_data_set = video_list(results)
        stat_data_set["candidate_name"] = q
        results_list.append(stat_data_set)
    data_set = pd.concat(results_list)
    return data_set

def get_2008_data(candidates):
    return get_data(candidates, publishedAfter="2008-08-04T00:00:00Z", publishedBefore="2008-11-04T00:00:00Z")

def get_2010_data(candidates):
    return get_data(candidates, publishedAfter="2010-08-04T00:00:00Z", publishedBefore="2010-11-04T00:00:00Z")

def get_2012_data(candidates):
    return get_data(candidates, publishedAfter="2012-08-04T00:00:00Z", publishedBefore="2012-11-04T00:00:00Z")

def get_2014_data(candidates):
    return get_data(candidates, publishedAfter="2014-08-04T00:00:00Z", publishedBefore="2014-11-04T00:00:00Z")

Analyzing Colorado Senate Race for 2014



In [9]:

    
candidates = ["Cory Gardner", "Mark Udall"] # Cory Gardner (R), Mark Udall (D)*
colorado_2014_ds = get_2014_data(candidates)
pd.pivot_table(colorado_2014_ds, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
               aggfunc='sum', rows="candidate_name")









    Out[9]:






  
    
      
      commentCount
      dislikeCount
      favoriteCount
      likeCount
      viewCount
    
    
      candidate_name
      
      
      
      
      
    
  
  
    
      Cory Gardner
       304
       167
       0
       437
       234669
    
    
      Mark Udall
       195
       470
       0
       450
       144744



In [10]:

    
for candidate, color in zip(candidates, ["r", "b"]):
    cand = colorado_2014_ds[colorado_2014_ds["candidate_name"]==candidate]
    by_date = cand["week"].value_counts()
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos Published")
plt.xlabel("Week")
plt.show()



In [11]:

    
for candidate, color in zip(candidates, ["r", "b"]):
    cand = colorado_2014_ds[colorado_2014_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["viewCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos viewCount")
plt.xlabel("Week")
plt.show()



In [12]:

    
for candidate, color in zip(candidates, ["r", "b"]):
    cand = colorado_2014_ds[colorado_2014_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["likeCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos likeCount")
plt.xlabel("Week")
plt.show()



In [13]:

    
for candidate, color in zip(candidates, ["r", "b"]):
    cand = colorado_2014_ds[colorado_2014_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["dislikeCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos dislikeCount")
plt.xlabel("Week")
plt.show()

How Predective Was It in 2012?

Virginia Senate - Allen vs. Kaine



In [14]:

    
candidates = ["George Allen", "Tim Kaine"] # George Allen (R), Tim Kaine (D)Winner
va_2012_ds = get_2012_data(candidates)
pd.pivot_table(va_2012_ds, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
               aggfunc='sum', rows="candidate_name")









    Out[14]:






  
    
      
      commentCount
      dislikeCount
      favoriteCount
      likeCount
      viewCount
    
    
      candidate_name
      
      
      
      
      
    
  
  
    
      George Allen
       297
       352
       0
       475
       203297
    
    
      Tim Kaine
       174
        97
       0
       553
       248367



In [15]:

    
for candidate, color in zip(candidates, ["r", "b"]):
    cand = va_2012_ds[va_2012_ds["candidate_name"]==candidate]
    by_date = cand["week"].value_counts()
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos Published")
plt.xlabel("Week")
plt.show()



In [16]:

    
for candidate, color in zip(candidates, ["r", "b"]):
    cand = va_2012_ds[va_2012_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["viewCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos viewCount")
plt.xlabel("Week")
plt.show()



In [17]:

    
for candidate, color in zip(candidates, ["r", "b"]):
    cand = va_2012_ds[va_2012_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["likeCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos likeCount")
plt.xlabel("Week")
plt.show()



In [18]:

    
for candidate, color in zip(candidates, ["r", "b"]):
    cand = va_2012_ds[va_2012_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["dislikeCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos dislikeCount")
plt.xlabel("Week")
plt.show()

Nevada Senate - Heller vs. Berkley



In [19]:

    
candidates = ["Dean Heller", "Shelley Berkley"] # Dean Heller (R)*Winnner, Shelley Berkley (D)
nv_2012_ds = get_2012_data(candidates)
print pd.pivot_table(nv_2012_ds, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
               aggfunc='sum', rows="candidate_name")

for candidate, color in zip(candidates, ["r", "b"]):
    cand = nv_2012_ds[nv_2012_ds["candidate_name"]==candidate]
    by_date = cand["week"].value_counts()
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos Published")
plt.xlabel("Week")
plt.show()

for candidate, color in zip(candidates, ["r", "b"]):
    cand = nv_2012_ds[nv_2012_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["viewCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos viewCount")
plt.xlabel("Week")
plt.show()

for candidate, color in zip(candidates, ["r", "b"]):
    cand = nv_2012_ds[nv_2012_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["likeCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos likeCount")
plt.xlabel("Week")
plt.show()

for candidate, color in zip(candidates, ["r", "b"]):
    cand = nv_2012_ds[nv_2012_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["dislikeCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos dislikeCount")
plt.xlabel("Week")
plt.show()









    



                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount
candidate_name                                                                  
Dean Heller               248           644              0        926     870677
Shelley Berkley           222           206              0        472     679636

Current Senate $113^{th}$

Data Sources:

Get Current Senate Data



In [20]:

    
url = "http://www.senate.gov/general/contact_information/senators_cfm.xml"
response = requests.get(url)
tree = etree.fromstring(str(response.text))
print tree









    



<Element contact_information at 0x7ff0804d4170>

Store Data In Pandas Data Frame



In [21]:

    
member_full = [member.xpath("member_full")[0].text for member in tree.xpath("//member")]
senators = pd.DataFrame(member_full, columns=["member_full"])

senators["member_full"] = member_full
senators["last_name"] = [member.xpath("last_name")[0].text for member in tree.xpath("//member")]
senators["first_name"] = [member.xpath("first_name")[0].text for member in tree.xpath("//member")]
senators["party"] = [member.xpath("party")[0].text for member in tree.xpath("//member")]
senators["state"] = [member.xpath("state")[0].text for member in tree.xpath("//member")]
senators["address"] = [member.xpath("address")[0].text for member in tree.xpath("//member")]
senators["phone"] = [member.xpath("phone")[0].text for member in tree.xpath("//member")]
senators["website"] = [member.xpath("website")[0].text for member in tree.xpath("//member")]
senators["bioguide_id"] = [member.xpath("bioguide_id")[0].text for member in tree.xpath("//member")]
senators["class"] = [member.xpath("class")[0].text for member in tree.xpath("//member")]

senators









    Out[21]:




<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 10 columns):
member_full    100  non-null values
last_name      100  non-null values
first_name     100  non-null values
party          100  non-null values
state          100  non-null values
address        100  non-null values
phone          100  non-null values
website        100  non-null values
bioguide_id    100  non-null values
class          100  non-null values
dtypes: object(10)

Control By Party



In [22]:

    
by_party = senators["party"].value_counts()
by_party.sort(ascending=False)
print by_party

color_dict = {"D": "b",
              "R": "r",
              "I": "g"}


labels = ["%s: %s" % (by_party.index[index], value) for index, value in enumerate(by_party)]
colors = list(pd.Series(by_party.index).map(color_dict))

plt.figure()
plt.axis("equal")
plt.pie(by_party.values, labels=labels, colors=colors, shadow=True, explode=np.zeros(len(by_party)) + 0.04)
plt.show()


fig = plt.figure()
axes = fig.add_subplot(111)
axes.barh(range(len(by_party.index)), by_party.values, color=colors)
plt.box(on="off")
axes.axvline(x=50, color="black", alpha=0.7, linewidth=2)
axes.yaxis.set_ticks([item + 0.4 for item in range(len(by_party.index))])
axes.yaxis.set_ticklabels(by_party.index, minor=False)
plt.xlabel("$113^{th}$ Senate Seats Controlled by Party")
plt.show()









    



D    53
R    45
I     2
dtype: int64

Who is up for Re-election?

Class II senators are up for re-election.



In [23]:

    
class_2_senators = senators[senators["class"]=="Class II"]
by_party =class_2_senators["party"].value_counts()
by_party.sort(ascending=False)
print by_party

labels = ["%s: %s" % (by_party.index[index], value) for index, value in enumerate(by_party)]
colors = list(pd.Series(by_party.index).map(color_dict))

plt.figure()
plt.axis("equal")
plt.pie(by_party.values, labels=labels, colors=colors, shadow=True, explode=np.zeros(len(by_party)) + 0.04)
plt.show()

color_dict = {"D": "b",
              "R": "r",
              "I": "g"}

fig = plt.figure()
axes = fig.add_subplot(111)
axes.barh(range(len(by_party.index)), by_party.values, color=colors)
plt.box(on="off")
axes.yaxis.set_ticks([item + 0.4 for item in range(len(by_party.index))])
axes.yaxis.set_ticklabels(by_party.index, minor=False)
plt.xlabel("$113^{th}$ Senate Seats of $Class II$ Controlled by Party")
plt.show()









    



D    20
R    13
dtype: int64

Looking at the other classes



In [24]:

    
class_3_senators = senators[senators["class"]=="Class III"]
by_party =class_3_senators["party"].value_counts()
by_party.sort(ascending=False)
print by_party

labels = ["%s: %s" % (by_party.index[index], value) for index, value in enumerate(by_party)]
colors = list(pd.Series(by_party.index).map(color_dict))

plt.figure()
plt.axis("equal")
plt.pie(by_party.values, labels=labels, colors=colors, shadow=True, explode=np.zeros(len(by_party)) + 0.04)
plt.show()

color_dict = {"D": "b",
              "R": "r",
              "I": "g"}

fig = plt.figure()
axes = fig.add_subplot(111)
axes.barh(range(len(by_party.index)), by_party.values, color=colors)
plt.box(on="off")
axes.yaxis.set_ticks([item + 0.4 for item in range(len(by_party.index))])
axes.yaxis.set_ticklabels(by_party.index, minor=False)
plt.xlabel("$113^{th}$ Senate Seats of $Class III$ Controlled by Party")
plt.show()









    



R    24
D    10
dtype: int64



In [25]:

    
class_1_senators = senators[senators["class"]=="Class I"]
by_party =class_1_senators["party"].value_counts()
by_party.sort(ascending=False)
print by_party

labels = ["%s: %s" % (by_party.index[index], value) for index, value in enumerate(by_party)]
colors = list(pd.Series(by_party.index).map(color_dict))

plt.figure()
plt.axis("equal")
plt.pie(by_party.values, labels=labels, colors=colors, shadow=True, explode=np.zeros(len(by_party)) + 0.04)
plt.show()

color_dict = {"D": "b",
              "R": "r",
              "I": "g"}

fig = plt.figure()
axes = fig.add_subplot(111)
axes.barh(range(len(by_party.index)), by_party.values, color=colors)
plt.box(on="off")
axes.yaxis.set_ticks([item + 0.4 for item in range(len(by_party.index))])
axes.yaxis.set_ticklabels(by_party.index, minor=False)
plt.xlabel("$113^{th}$ Senate Seats of $Class I$ Controlled by Party")
plt.show()









    



D    23
R     8
I     2
dtype: int64

Forecasting Results of Senate Elections 2014

Start with listing all seat in $Class II$



In [26]:

    
class_2_senators = senators[senators["class"]=="Class II"].sort("state")
class_2_senators









    Out[26]:




<class 'pandas.core.frame.DataFrame'>
Int64Index: 33 entries, 4 to 29
Data columns (total 10 columns):
member_full    33  non-null values
last_name      33  non-null values
first_name     33  non-null values
party          33  non-null values
state          33  non-null values
address        33  non-null values
phone          33  non-null values
website        33  non-null values
bioguide_id    33  non-null values
class          33  non-null values
dtypes: object(10)

Get Competitors

Fetch Data



In [63]:

    
url = "http://www.fec.gov/data/CandidateSummary.do?format=xml"
response = requests.get(url)
page = html.fromstring(str(response.text))
print response.text[:1000]









    



<data.fec.gov xmlns:fecdc="http://www.w3.org/2001/XMLSchema-instance" fecdc:schemaLocation="/data /finance/disclosure/schema/CandidateSummary.xsd"><title>Candidate Summary</title><description>This file contains information for each candidate who has registered with the FEC or appears on an official state ballot for an election to the U.S. House of Representatives, U.S. Senate or U.S. President. The table is available for the current election cycle and for election cycles through 2008.</description><timestamp>2014-10-09T05:06:27-05:00</timestamp><copyright>Copyright 2014, Federal Election Commission.</copyright><can_sum><lin_ima>http://www.fec.gov/fecviewer/CandidateCommitteeDetail.do?candidateCommitteeId=H4UT04052&amp;tabIndex=1</lin_ima><can_id>H4UT04052</can_id><can_nam>AALDERS, TIM</can_nam><can_off>H</can_off><can_off_sta>UT</can_off_sta><can_off_dis>04</can_off_dis><can_par_aff>IAP</can_par_aff><can_inc_cha_ope_sea>OPEN</can_inc_cha_ope_sea><can_str1>5306 WEST 10320 NORTH</can_str

Process the data into an XML Tree



In [64]:

    
for item in page[:10]:
    print item.tag









    



title
description
timestamp
copyright
can_sum
can_sum
can_sum
can_sum
can_sum
can_sum

Notice <can_sum> encapsulates the candidates data.



In [65]:

    
for item in page.xpath("//can_sum")[0]:
    print "<%s>%s</%s>" % (item.tag, str(item.text), item.tag)









    



<lin_ima>http://www.fec.gov/fecviewer/CandidateCommitteeDetail.do?candidateCommitteeId=H4UT04052&tabIndex=1</lin_ima>
<can_id>H4UT04052</can_id>
<can_nam>AALDERS, TIM</can_nam>
<can_off>H</can_off>
<can_off_sta>UT</can_off_sta>
<can_off_dis>04</can_off_dis>
<can_par_aff>IAP</can_par_aff>
<can_inc_cha_ope_sea>OPEN</can_inc_cha_ope_sea>
<can_str1>5306 WEST 10320 NORTH</can_str1>
<can_str2>None</can_str2>
<can_cit>HIGHLAND</can_cit>
<can_sta>UT</can_sta>
<can_zip>84003</can_zip>
<ind_ite_con>None</ind_ite_con>
<ind_uni_con>None</ind_uni_con>
<ind_con>None</ind_con>
<par_com_con>None</par_com_con>
<oth_com_con>None</oth_com_con>
<can_con>None</can_con>
<tot_con>None</tot_con>
<tra_fro_oth_aut_com>None</tra_fro_oth_aut_com>
<can_loa>None</can_loa>
<oth_loa>None</oth_loa>
<tot_loa>None</tot_loa>
<off_to_ope_exp>None</off_to_ope_exp>
<off_to_fun>None</off_to_fun>
<off_to_leg_acc>None</off_to_leg_acc>
<oth_rec>None</oth_rec>
<tot_rec>None</tot_rec>
<ope_exp>None</ope_exp>
<exe_leg_acc_dis>None</exe_leg_acc_dis>
<fun_dis>None</fun_dis>
<tra_to_oth_aut_com>None</tra_to_oth_aut_com>
<can_loa_rep>None</can_loa_rep>
<oth_loa_rep>None</oth_loa_rep>
<tot_loa_rep>None</tot_loa_rep>
<ind_ref>None</ind_ref>
<par_com_ref>None</par_com_ref>
<oth_com_ref>None</oth_com_ref>
<tot_con_ref>None</tot_con_ref>
<oth_dis>None</oth_dis>
<tot_dis>None</tot_dis>
<cas_on_han_beg_of_per>None</cas_on_han_beg_of_per>
<cas_on_han_clo_of_per>None</cas_on_han_clo_of_per>
<net_con>None</net_con>
<net_ope_exp>None</net_ope_exp>
<deb_owe_by_com>None</deb_owe_by_com>
<deb_owe_to_com>None</deb_owe_to_com>
<cov_sta_dat>None</cov_sta_dat>
<cov_end_dat>None</cov_end_dat>



In [66]:

    
cand_list = [cand for cand in page.xpath("//can_sum") if cand.xpath("can_off")[0].text=="S"]
lin_ima = [cand.xpath("lin_ima")[0].text for cand in cand_list]
len(lin_ima)









    Out[66]:





412

Store data into Pandas Data Frame



In [67]:

    
senate_cadidate = pd.DataFrame(lin_ima, columns=["lin_ima"])
senate_cadidate["can_id"] = [cand.xpath("can_id")[0].text for cand in cand_list]
senate_cadidate["can_nam"] = [cand.xpath("can_nam")[0].text for cand in cand_list]
senate_cadidate["can_off"] = [cand.xpath("can_off")[0].text for cand in cand_list]
senate_cadidate["can_off_sta"] = [cand.xpath("can_off_sta")[0].text for cand in cand_list]
senate_cadidate["can_par_aff"] = [cand.xpath("can_par_aff")[0].text for cand in cand_list]
senate_cadidate["can_inc_cha_ope_sea"] = [cand.xpath("can_inc_cha_ope_sea")[0].text for cand in cand_list]
senate_cadidate["ind_ite_con"] = [cand.xpath("ind_ite_con")[0].text for cand in cand_list]
senate_cadidate["ind_uni_con"] = [cand.xpath("ind_uni_con")[0].text for cand in cand_list]
senate_cadidate["ind_con"] = [cand.xpath("ind_con")[0].text for cand in cand_list]
senate_cadidate["par_com_con"] = [cand.xpath("par_com_con")[0].text for cand in cand_list]
senate_cadidate["oth_com_con"] = [cand.xpath("oth_com_con")[0].text for cand in cand_list]
senate_cadidate["can_con"] = [cand.xpath("can_con")[0].text for cand in cand_list]
senate_cadidate["tot_con"] = [cand.xpath("tot_con")[0].text for cand in cand_list]
senate_cadidate["tra_fro_oth_aut_com"] = [cand.xpath("tra_fro_oth_aut_com")[0].text for cand in cand_list]
senate_cadidate["can_loa"] = [cand.xpath("can_loa")[0].text for cand in cand_list]
senate_cadidate["oth_loa"] = [cand.xpath("oth_loa")[0].text for cand in cand_list]
senate_cadidate["tot_loa"] = [cand.xpath("tot_loa")[0].text for cand in cand_list]
senate_cadidate["off_to_ope_exp"] = [cand.xpath("off_to_ope_exp")[0].text for cand in cand_list]
senate_cadidate["off_to_fun"] = [cand.xpath("off_to_fun")[0].text for cand in cand_list]
senate_cadidate["off_to_leg_acc"] = [cand.xpath("off_to_leg_acc")[0].text for cand in cand_list]
senate_cadidate["oth_rec"] = [cand.xpath("oth_rec")[0].text for cand in cand_list]
senate_cadidate["tot_rec"] = [cand.xpath("tot_rec")[0].text for cand in cand_list]
senate_cadidate["ope_exp"] = [cand.xpath("ope_exp")[0].text for cand in cand_list]
senate_cadidate["fun_dis"] = [cand.xpath("fun_dis")[0].text for cand in cand_list]
senate_cadidate["exe_leg_acc_dis"] = [cand.xpath("exe_leg_acc_dis")[0].text for cand in cand_list]
senate_cadidate["tra_to_oth_aut_com"] = [cand.xpath("tra_to_oth_aut_com")[0].text for cand in cand_list]
senate_cadidate["can_loa_rep"] = [cand.xpath("can_loa_rep")[0].text for cand in cand_list]
senate_cadidate["oth_loa_rep"] = [cand.xpath("oth_loa_rep")[0].text for cand in cand_list]
senate_cadidate["tot_loa_rep"] = [cand.xpath("tot_loa_rep")[0].text for cand in cand_list]
senate_cadidate["ind_ref"] = [cand.xpath("ind_ref")[0].text for cand in cand_list]
senate_cadidate["par_com_ref"] = [cand.xpath("par_com_ref")[0].text for cand in cand_list]
senate_cadidate["oth_com_ref"] = [cand.xpath("oth_com_ref")[0].text for cand in cand_list]
senate_cadidate["tot_con_ref"] = [cand.xpath("tot_con_ref")[0].text for cand in cand_list]
senate_cadidate["oth_dis"] = [cand.xpath("oth_dis")[0].text for cand in cand_list]
senate_cadidate["tot_dis"] = [cand.xpath("tot_dis")[0].text for cand in cand_list]
senate_cadidate["cas_on_han_beg_of_per"] = [cand.xpath("cas_on_han_beg_of_per")[0].text for cand in cand_list]
senate_cadidate["cas_on_han_clo_of_per"] = [cand.xpath("cas_on_han_clo_of_per")[0].text for cand in cand_list]
senate_cadidate["net_con"] = [cand.xpath("net_con")[0].text for cand in cand_list]
senate_cadidate["net_ope_exp"] = [cand.xpath("net_ope_exp")[0].text for cand in cand_list]
senate_cadidate["deb_owe_by_com"] = [cand.xpath("deb_owe_by_com")[0].text for cand in cand_list]
senate_cadidate["deb_owe_to_com"] = [cand.xpath("deb_owe_to_com")[0].text for cand in cand_list]
senate_cadidate["cov_sta_dat"] = [cand.xpath("cov_sta_dat")[0].text for cand in cand_list]
senate_cadidate["cov_end_dat"] = [cand.xpath("cov_end_dat")[0].text for cand in cand_list]
senate_cadidate









    Out[67]:




<class 'pandas.core.frame.DataFrame'>
Int64Index: 412 entries, 0 to 411
Data columns (total 44 columns):
lin_ima                  412  non-null values
can_id                   412  non-null values
can_nam                  412  non-null values
can_off                  412  non-null values
can_off_sta              412  non-null values
can_par_aff              412  non-null values
can_inc_cha_ope_sea      411  non-null values
ind_ite_con              197  non-null values
ind_uni_con              186  non-null values
ind_con                  204  non-null values
par_com_con              37  non-null values
oth_com_con              116  non-null values
can_con                  101  non-null values
tot_con                  220  non-null values
tra_fro_oth_aut_com      61  non-null values
can_loa                  103  non-null values
oth_loa                  11  non-null values
tot_loa                  102  non-null values
off_to_ope_exp           104  non-null values
off_to_fun               0  non-null values
off_to_leg_acc           0  non-null values
oth_rec                  76  non-null values
tot_rec                  223  non-null values
ope_exp                  221  non-null values
fun_dis                  0  non-null values
exe_leg_acc_dis          0  non-null values
tra_to_oth_aut_com       22  non-null values
can_loa_rep              34  non-null values
oth_loa_rep              4  non-null values
tot_loa_rep              38  non-null values
ind_ref                  117  non-null values
par_com_ref              4  non-null values
oth_com_ref              47  non-null values
tot_con_ref              121  non-null values
oth_dis                  86  non-null values
tot_dis                  221  non-null values
cas_on_han_beg_of_per    69  non-null values
cas_on_han_clo_of_per    192  non-null values
net_con                  215  non-null values
net_ope_exp              217  non-null values
deb_owe_by_com           108  non-null values
deb_owe_to_com           4  non-null values
cov_sta_dat              231  non-null values
cov_end_dat              231  non-null values
dtypes: object(44)

Retrieive YouTube Data for All Candidates



In [69]:

    
def get_state_data(candidates):
    data_set = get_2014_data(candidates)
    t_ds = pd.pivot_table(data_set, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
                   aggfunc='sum', rows="candidate_name")
    t_ds["like_dislike_r"] = t_ds["likeCount"] / (t_ds["dislikeCount"] + t_ds["likeCount"])
    t_ds["views_share"] = t_ds["viewCount"] / t_ds["viewCount"].sum()
    t_ds["msgs_share"] = t_ds["commentCount"] / t_ds["commentCount"].sum()
    t_ds["likes_share"] = t_ds["likeCount"] / t_ds["likeCount"].sum()
    t_ds["dislikes_share"] = t_ds["dislikeCount"] / t_ds["dislikeCount"].sum()
    print t_ds
    return t_ds

def fix_name(val_name):
    val_names = val_name.split(", ")
    return "%s %s" % (val_names[1].split(" ")[0].capitalize(), val_names[0].capitalize())



In [79]:

    
values_list = []
for index, state in zip(class_2_senators.index, class_2_senators["state"]):
    print "%s: %s" % (state,
                           class_2_senators["member_full"][index])
    candidates = senate_cadidate[senate_cadidate["can_off_sta"]==state]
    candidates = candidates[~senate_cadidate["tot_rec"].isnull()]
    candidates["tot_rec_num"] = candidates["tot_rec"].apply(lambda x: x[1:].replace(",","")).astype(np.float64)
    top_candidates = candidates.sort("tot_rec_num", ascending=False)[:2][["can_nam",
                                                                      "can_par_aff",
                                                                      "can_inc_cha_ope_sea",
                                                                      "tot_rec_num",
                                                                      "can_off_sta"]]
    top_candidates["full_name"] = [fix_name(name) for name in top_candidates.values[:,0]]
    top_candidates = top_candidates.sort("full_name")
    print top_candidates["full_name"]
    try:
        ds = get_state_data([fix_name(name) for name in top_candidates.values[:,0]])
        ds["state"] = state
        ds["party"] = top_candidates["can_par_aff"].values
        ds["donations"] = top_candidates["tot_rec_num"].values
        values_list.append(ds)
    except:
        print "NA"
        
        
sentate_2014 = pd.concat(values_list)
sentate_2014









    



AK: Begich (D-AK)
359    Dan Sullivan
27      Mark Begich
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Dan Sullivan             228            90              0        496     189278        0.846416     0.644151   
Mark Begich               65            96              0        157     104563        0.620553     0.355849   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Dan Sullivan      0.778157     0.759571        0.483871  
Mark Begich       0.221843     0.240429        0.516129  
AL: Sessions (R-AL)
335    Jeff Sessions
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Jeff Sessions            137            10              0         29       4800         0.74359            1   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Jeff Sessions            1            1               1  
AR: Pryor (D-AR)
290       Mark Pryor
89     Thomas Cotton
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Mark Pryor               139            37              0        175     228610        0.825472     0.263103   
Thomas Cotton            152            73              0        270     640288        0.787172     0.736897   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Mark Pryor        0.477663     0.393258        0.336364  
Thomas Cotton     0.522337     0.606742        0.663636  
CO: Udall (D-CO)
137    Cory Gardner
375      Mark Udall
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Cory Gardner             327           157              0        467     282673        0.748397     0.659577   
Mark Udall               263           455              0        411     145894        0.474596     0.340423   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Cory Gardner      0.554237     0.531891        0.256536  
Mark Udall        0.445763     0.468109        0.743464  
DE: Coons (D-DE)
85     Christopher Coons
379           Kevin Wade
Name: full_name, dtype: object
                   commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                    
Christopher Coons           200            20              0        110      20040        0.846154      0.54844   
Kevin Wade                  110            30              0        190      16500        0.863636      0.45156   

                   msgs_share  likes_share  dislikes_share  
candidate_name                                              
Christopher Coons    0.645161     0.366667             0.4  
Kevin Wade           0.354839     0.633333             0.6  
GA: Chambliss (R-GA)
197    John Kingston
261        Mary Nunn
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
John Kingston            113             4              0         60       2597        0.937500     0.727247   
Mary Nunn                 70             8              0         18        974        0.692308     0.272753   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
John Kingston     0.617486     0.769231        0.333333  
Mary Nunn         0.382514     0.230769        0.666667  
IA: Harkin (D-IA)
43     Bruce Braley
180     Mark Jacobs
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bruce Braley             260            70              0        278      87802        0.798851     0.056535   
Mark Jacobs             7341          1053              0      66086    1465259        0.984316     0.943465   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Bruce Braley      0.034206     0.004189        0.062333  
Mark Jacobs       0.965794     0.995811        0.937667  
ID: Risch (R-ID)
253    Briane Mitchell
306        James Risch
Name: full_name, dtype: object
NA
IL: Durbin (D-IL)
264    James Oberweis
115    Richard Durbin
Name: full_name, dtype: object
NA
KS: Roberts (R-KS)
403    Milton Wolf
307    Pat Roberts
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Milton Wolf               20            20              0         20       5210        0.500000     0.039943   
Pat Roberts              488            98              0       1000     125227        0.910747     0.960057   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Milton Wolf        0.03937     0.019608        0.169492  
Pat Roberts        0.96063     0.980392        0.830508  
KY: McConnell (R-KY)
152      Alison Grimes
239    Mitch Mcconnell
Name: full_name, dtype: object
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Alison Grimes            1362           376              0       2797     706090        0.881500     0.432752   
Mitch Mcconnell          2247           291              0       4839     925538        0.943275     0.567248   

                 msgs_share  likes_share  dislikes_share  
candidate_name                                            
Alison Grimes       0.37739     0.366291        0.563718  
Mitch Mcconnell     0.62261     0.633709        0.436282  
LA: Landrieu (D-LA)
206      Mary Landrieu
69     William Cassidy
Name: full_name, dtype: object
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Mary Landrieu             744            79              0       2113     407306        0.963960     0.984916   
William Cassidy            84             1              0         94       6238        0.989474     0.015084   

                 msgs_share  likes_share  dislikes_share  
candidate_name                                            
Mary Landrieu      0.898551     0.957408          0.9875  
William Cassidy    0.101449     0.042592          0.0125  
MA: Markey (D-MA)
231    Edward Markey
146    Gabriel Gomez
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Edward Markey              0            10              0         70       7994        0.875000     0.004272   
Gabriel Gomez            920           515              0       1235    1863296        0.705714     0.995728   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Edward Markey            0      0.05364        0.019048  
Gabriel Gomez            1      0.94636        0.980952  
ME: Collins (R-ME)
29    Shenna Bellows
80     Susan Collins
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Shenna Bellows            15             0              0          6      57891         1.00000     0.163068   
Susan Collins             90            18              0        351     297121         0.95122     0.836932   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Shenna Bellows    0.142857     0.016807               0  
Susan Collins     0.857143     0.983193               1  
MI: Levin (D-MI)
280    Gary Peters
205     Terri Land
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Gary Peters               89            45              0        424      37748        0.904051     0.012098   
Terri Land               130           558              0       1784    3082566        0.761742     0.987902   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Gary Peters       0.406393     0.192029        0.074627  
Terri Land        0.593607     0.807971        0.925373  
MN: Franken (D-MN)
133          Al Franken
242    Michael Mcfadden
Name: full_name, dtype: object
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Al Franken                 186            79              0        235     262735        0.748408     0.894219   
Michael Mcfadden            57            40              0        106      31080        0.726027     0.105781   

                  msgs_share  likes_share  dislikes_share  
candidate_name                                             
Al Franken          0.765432      0.68915        0.663866  
Michael Mcfadden    0.234568      0.31085        0.336134  
MS: Cochran (R-MS)
241    Christopher Mcdaniel
79             Thad Cochran
Name: full_name, dtype: object
                      commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                       
Christopher Mcdaniel             0             0              0          6       1062        1.000000     0.061698   
Thad Cochran                   109            10              0         55      16151        0.846154     0.938302   

                      msgs_share  likes_share  dislikes_share  
candidate_name                                                 
Christopher Mcdaniel           0     0.098361               0  
Thad Cochran                   1     0.901639               1  
MT: Walsh (D-MT)
384       John Walsh
104    Steven Daines
Name: full_name, dtype: object
NA
NC: Hagan (D-NC)
155      Kay Hagan
370    Thom Tillis
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Kay Hagan                278           128              0        271      51704        0.679198     0.056229   
Thom Tillis              231           206              0        561     867825        0.731421     0.943771   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Kay Hagan         0.546169     0.325721        0.383234  
Thom Tillis       0.453831     0.674279        0.616766  
NE: Johanns (R-NE)
325    Benjamin Sasse
111      Sid Dinsdale
Name: full_name, dtype: object
NA
NH: Shaheen (D-NH)
336    Jeanne Shaheen
50        Scott Brown
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Jeanne Shaheen            30            34              0         84      37123        0.711864     0.085959   
Scott Brown              721            96              0       2465     394746        0.962515     0.914041   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Jeanne Shaheen    0.039947     0.032954        0.261538  
Scott Brown       0.960053     0.967046        0.738462  
NJ: Booker (D-NJ)
36       Cory Booker
272    Frank Pallone
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Cory Booker              299            43              0        688      93738        0.941176     0.997956   
Frank Pallone              1             0              0          0        192             inf     0.002044   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Cory Booker       0.996667            1               1  
Frank Pallone     0.003333            0               0  
NM: Udall (D-NM)
391    Allen Weh
376    Tom Udall
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Allen Weh                800            70              0        250    1950780        0.781250     0.987187   
Tom Udall                700            40              0        670      25320        0.943662     0.012813   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Allen Weh         0.533333     0.271739        0.636364  
Tom Udall         0.466667     0.728261        0.363636  
OK: Inhofe (R-OK)
178      James Inhofe
207    James Lankford
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
James Inhofe            1927           412              0       2116     172284        0.837025     0.985471   
James Lankford            30            10              0         10       2540        0.500000     0.014529   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
James Inhofe       0.98467     0.995296        0.976303  
James Lankford     0.01533     0.004704        0.023697  
OR: Merkley (D-OR)
249    Jeffrey Merkley
392       Monica Wehby
Name: full_name, dtype: object
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Jeffrey Merkley           156            36              0        392      43632        0.915888     0.310716   
Monica Wehby              135           305              0        660      96792        0.683938     0.689284   

                 msgs_share  likes_share  dislikes_share  
candidate_name                                            
Jeffrey Merkley    0.536082     0.372624        0.105572  
Monica Wehby       0.463918     0.627376        0.894428  
RI: Reed (D-RI)
301        Jack Reed
410    Mark Zaccaria
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Jack Reed                 70             9              0        180      41526        0.952381     0.945277   
Mark Zaccaria             10             0              0          0       2404             inf     0.054723   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Jack Reed            0.875            1               1  
Mark Zaccaria        0.125            0               0  
SC: Graham (R-SC)
148    Lindsey Graham
333     Timothy Scott
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Lindsey Graham          1274           184              0       1555     125933        0.894192     0.877735   
Timothy Scott            198             0              0        332      17542        1.000000     0.122265   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Lindsey Graham    0.865489     0.824059               1  
Timothy Scott     0.134511     0.175941               0  
SD: Johnson (D-SD)
40     Annette Bosworth
315       Marion Rounds
Name: full_name, dtype: object
NA
TN: Alexander (R-TN)
131       George Flinn
8      Lamar Alexander
Name: full_name, dtype: object
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
George Flinn              279            27              0         51       5247        0.653846     0.437505   
Lamar Alexander            37             4              0         49       6746        0.924528     0.562495   

                 msgs_share  likes_share  dislikes_share  
candidate_name                                            
George Flinn       0.882911         0.51        0.870968  
Lamar Alexander    0.117089         0.49        0.129032  
TX: Cornyn (R-TX)
7     David Alameel
88      John Cornyn
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
David Alameel              3             2              0          2        105        0.500000     0.004871   
John Cornyn              166            20              0        260      21450        0.928571     0.995129   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
David Alameel     0.017751     0.007634        0.090909  
John Cornyn       0.982249     0.992366        0.909091  
VA: Warner (D-VA)
140    Edward Gillespie
386         Mark Warner
Name: full_name, dtype: object
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Edward Gillespie             2             0              0          8        434        1.000000     0.021218   
Mark Warner                 53             9              0         44      20020        0.830189     0.978782   

                  msgs_share  likes_share  dislikes_share  
candidate_name                                             
Edward Gillespie    0.036364     0.153846               0  
Mark Warner         0.963636     0.846154               1  
WV: Rockefeller (D-WV)
366    Natalie Tennant
61      Shelley Capito
Name: full_name, dtype: object
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Natalie Tennant           350           271              0        830     149271        0.753860     0.945921   
Shelley Capito             48            24              0        152       8534        0.863636     0.054079   

                 msgs_share  likes_share  dislikes_share  
candidate_name                                            
Natalie Tennant    0.879397     0.845214        0.918644  
Shelley Capito     0.120603     0.154786        0.081356  
WY: Enzi (R-WY)
72     Elizabeth Cheney
121        Michael Enzi
Name: full_name, dtype: object
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Elizabeth Cheney            48             0              0         30       4482               1     0.749875   
Michael Enzi                25             0              0         70       1495               1     0.250125   

                  msgs_share  likes_share  dislikes_share  
candidate_name                                             
Elizabeth Cheney    0.657534          0.3             inf  
Michael Enzi        0.342466          0.7             inf  






    Out[79]:




<class 'pandas.core.frame.DataFrame'>
Index: 55 entries, Dan Sullivan to Michael Enzi
Data columns (total 13 columns):
commentCount      55  non-null values
dislikeCount      55  non-null values
favoriteCount     55  non-null values
likeCount         55  non-null values
viewCount         55  non-null values
like_dislike_r    55  non-null values
views_share       55  non-null values
msgs_share        55  non-null values
likes_share       55  non-null values
dislikes_share    55  non-null values
state             55  non-null values
party             55  non-null values
donations         55  non-null values
dtypes: float64(6), int64(5), object(2)



In [94]:

    
class_2_senators["state"]









    Out[94]:





4     AK
84    AL
73    AR
91    CO
22    DE
17    GA
38    IA
76    ID
28    IL
77    KS
62    KY
54    LA
59    MA
21    ME
57    MI
33    MN
20    MS
94    MT
37    NC
47    NE
85    NH
8     NJ
92    NM
45    OK
64    OR
74    RI
35    SC
49    SD
0     TN
24    TX
95    VA
78    WV
29    WY
Name: state, dtype: object



In [97]:

    
x_column = "views_share"
y_column = "viewCount"
s_column = "donations"

color_dict = {"DEM": "b", "REP": "r", "IND":"g", "NPA": "g", "DFL": "g"}
            
plt.figure(figsize=(18,12))

for party in sentate_2014["party"].unique():
    cands = sentate_2014[sentate_2014["party"]==party]
    x = cands[x_column]
    y = cands[y_column]
    size = sentate_2014[sentate_2014["party"]==party][s_column] / 3000000
    plt.scatter(x,y, s=(np.array(size)) * 1000, c=color_dict[party], alpha=0.5)


    
print plt.ylim()[1]
plt.vlines(0.5, ymin=1, ymax=plt.ylim()[1]*0.9)

prejected_winners = sentate_2014[sentate_2014[x_column]>0.5]["party"].value_counts()

result_text = []
for item in sentate_2014.iterrows():#[sentate_2014[x_column]>0.5].iterrows():
    plt.annotate(item[1]["state"], xy=(item[1][x_column], item[1][y_column]))

for item in sentate_2014[sentate_2014[x_column]>0.5].iterrows():
    result_text += ["%s: %s (%s) - %0.1f%%" % (item[1]["state"], item[0], item[1]["party"], item[1]["views_share"] * 100.)]
result_text = "\n".join(result_text)
prejected_winners = "\n".join(["%s:%s" % (party, value) for party, value in zip(prejected_winners.index, prejected_winners.values)])

plt.annotate(prejected_winners, xy=(.65,plt.ylim()[1]*0.8))

plt.annotate(result_text, xy=(.8, 1.5))

plt.xlabel(x_column)
plt.ylabel(y_column + " (Log Scale)")
plt.grid()
plt.yscale("log")
#plt.axis("tight")
plt.title("Senate 2014 Elections Forecast (Size is relative and represents the amount of donations)")
plt.show()



In [58]:

    
sentate_2014[sentate_2014[x_column]>0.5]









    Out[58]:






  
    
      
      commentCount
      dislikeCount
      favoriteCount
      likeCount
      viewCount
      like_dislike_r
      views_share
      msgs_share
      likes_share
      dislikes_share
      state
      party
      donations
    
    
      candidate_name
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Dan Sullivan
        360
         94
       0
         623
        165715
       0.868898
       0.613568
       0.814480
       0.774876
       0.494737
       AK
       DEM
        6340422.00
    
    
      Jeff Sessions
        142
         10
       0
          36
          5114
       0.782609
       1.000000
       1.000000
       1.000000
       1.000000
       AL
       REP
        1115688.00
    
    
      Thomas Cotton
        100
         91
       0
         195
        797919
       0.681818
       0.814496
       0.434783
       0.537190
       0.722222
       AR
       REP
        7097224.06
    
    
      Cory Gardner
        316
        167
       0
         450
        234223
       0.729335
       0.628876
       0.633267
       0.530660
       0.268058
       CO
       DEM
       10420571.00
    
    
      Christopher Coons
        190
         20
       0
         110
         19950
       0.846154
       0.559278
       0.627063
       0.379310
       0.400000
       DE
       DEM
        4173447.00
    
    
      John Kingston
         37
          2
       0
          17
           717
       0.894737
       0.540724
       0.451220
       0.447368
       1.000000
       GA
       DEM
        9211931.00
    
    
      Mark Jacobs
       7334
       1052
       0
       66064
       1464016
       0.984326
       0.943058
       0.959571
       0.995765
       0.932624
       IA
       REP
        4810813.00
    
    
      Pat Roberts
        256
         83
       0
         374
         97678
       0.818381
       0.943867
       0.733524
       0.869767
       0.768519
       KS
       REP
        1068018.00
    
    
      Mitch Mcconnell
       2197
        272
       0
        4760
        893077
       0.945946
       0.569208
       0.632230
       0.640387
       0.428346
       KY
       DEM
       11353760.00
    
    
      Mary Landrieu
        745
         84
       0
        2015
        395161
       0.959981
       0.986888
       0.937107
       0.971084
       1.000000
       LA
       DEM
       10190144.00
    
    
      Gabriel Gomez
        899
        514
       0
        1239
       1855996
       0.706788
       0.995771
       1.000000
       0.946524
       0.980916
       MA
       REP
        4755654.00
    
    
      Susan Collins
         81
          9
       0
         351
        291078
       0.975000
       0.834797
       0.658537
       0.983193
       1.000000
       ME
       DEM
        1333016.00
    
    
      Terri Land
        110
        560
       0
        1660
       3077280
       0.747748
       0.988351
       0.578947
       0.783019
       0.918033
       MI
       DEM
        6994603.00
    
    
      Al Franken
        141
         71
       0
         208
        260800
       0.745520
       0.897435
       0.762162
       0.753623
       0.639640
       MN
       DFL
       15126268.00
    
    
      Thad Cochran
         74
         10
       0
          35
         15199
       0.777778
       0.955972
       1.000000
       0.897436
       1.000000
       MS
       REP
        2727209.00
    
    
      Thom Tillis
        253
        194
       0
         586
        797943
       0.751282
       0.950310
       0.496078
       0.716381
       0.619808
       NC
       REP
        4764110.00
    
    
      Scott Brown
        708
         92
       0
        2456
        388637
       0.963893
       0.916956
       0.964578
       0.967310
       0.736000
       NH
       REP
        3686708.00
    
    
      Cory Booker
        355
         60
       0
         894
        130143
       0.937107
       0.998527
       0.997191
       1.000000
       1.000000
       NJ
       DEM
       16167874.00
    
    
      Allen Weh
        821
         71
       0
         256
       1949518
       0.782875
       0.987909
       0.539776
       0.279476
       0.702970
       NM
       DEM
        5050539.00
    
    
      James Inhofe
       1905
        406
       0
        2094
        170226
       0.837600
       0.981288
       0.968480
       0.984023
       0.966667
       OK
       REP
        2811701.00
    
    
      Monica Wehby
        115
        295
       0
         660
         95280
       0.691099
       0.702572
       0.481172
       0.662651
       0.891239
       OR
       REP
        2049732.00
    
    
      Jack Reed
         90
          4
       0
         128
         19412
       0.969697
       0.910763
       0.865385
       0.969697
       1.000000
       RI
       DEM
        2833802.97
    
    
      Lindsey Graham
       1089
        135
       0
        1412
        119855
       0.912734
       0.891586
       0.840927
       0.811494
       1.000000
       SC
       REP
        6788544.00
    
    
      Lamar Alexander
         34
          4
       0
          48
          6291
       0.923077
       0.590538
       0.161905
       0.578313
       0.173913
       TN
       REP
        1812250.00
    
    
      John Cornyn
        166
         20
       0
         260
         21110
       0.928571
       0.995943
       0.982249
       0.996169
       0.909091
       TX
       DEM
        9673572.00
    
    
      Natalie Tennant
        210
        110
       0
         690
        159430
       0.862500
       0.969238
       0.860656
       0.873418
       0.873016
       WV
       REP
        5482547.00
    
    
      Elizabeth Cheney
         30
          0
       0
          20
          3900
       1.000000
       0.722892
       0.545455
       0.222222
            inf
       WY
       REP
        3016825.00



In [52]:









    



commentCount            360
dislikeCount             94
favoriteCount             0
likeCount               623
viewCount            165715
like_dislike_r    0.8688982
views_share       0.6135684
msgs_share        0.8144796
likes_share       0.7748756
dislikes_share    0.4947368
state                    AK
party                   DEM
donations           6340422
Name: Dan Sullivan, dtype: object
commentCount            142
dislikeCount             10
favoriteCount             0
likeCount                36
viewCount              5114
like_dislike_r    0.7826087
views_share               1
msgs_share                1
likes_share               1
dislikes_share            1
state                    AL
party                   REP
donations           1115688
Name: Jeff Sessions, dtype: object
commentCount            100
dislikeCount             91
favoriteCount             0
likeCount               195
viewCount            797919
like_dislike_r    0.6818182
views_share       0.8144956
msgs_share        0.4347826
likes_share       0.5371901
dislikes_share    0.7222222
state                    AR
party                   REP
donations           7097224
Name: Thomas Cotton, dtype: object
commentCount               316
dislikeCount               167
favoriteCount                0
likeCount                  450
viewCount               234223
like_dislike_r       0.7293355
views_share          0.6288761
msgs_share           0.6332665
likes_share          0.5306604
dislikes_share       0.2680578
state                       CO
party                      DEM
donations         1.042057e+07
Name: Cory Gardner, dtype: object
commentCount            190
dislikeCount             20
favoriteCount             0
likeCount               110
viewCount             19950
like_dislike_r    0.8461538
views_share       0.5592778
msgs_share        0.6270627
likes_share       0.3793103
dislikes_share          0.4
state                    DE
party                   DEM
donations           4173447
Name: Christopher Coons, dtype: object
commentCount             37
dislikeCount              2
favoriteCount             0
likeCount                17
viewCount               717
like_dislike_r    0.8947368
views_share        0.540724
msgs_share        0.4512195
likes_share       0.4473684
dislikes_share            1
state                    GA
party                   DEM
donations           9211931
Name: John Kingston, dtype: object
commentCount           7334
dislikeCount           1052
favoriteCount             0
likeCount             66064
viewCount           1464016
like_dislike_r    0.9843256
views_share       0.9430583
msgs_share        0.9595708
likes_share       0.9957646
dislikes_share    0.9326241
state                    IA
party                   REP
donations           4810813
Name: Mark Jacobs, dtype: object
commentCount            256
dislikeCount             83
favoriteCount             0
likeCount               374
viewCount             97678
like_dislike_r    0.8183807
views_share       0.9438673
msgs_share        0.7335244
likes_share       0.8697674
dislikes_share    0.7685185
state                    KS
party                   REP
donations           1068018
Name: Pat Roberts, dtype: object
commentCount              2197
dislikeCount               272
favoriteCount                0
likeCount                 4760
viewCount               893077
like_dislike_r       0.9459459
views_share          0.5692079
msgs_share           0.6322302
likes_share          0.6403875
dislikes_share       0.4283465
state                       KY
party                      DEM
donations         1.135376e+07
Name: Mitch Mcconnell, dtype: object
commentCount               745
dislikeCount                84
favoriteCount                0
likeCount                 2015
viewCount               395161
like_dislike_r       0.9599809
views_share          0.9868885
msgs_share           0.9371069
likes_share          0.9710843
dislikes_share               1
state                       LA
party                      DEM
donations         1.019014e+07
Name: Mary Landrieu, dtype: object
commentCount            899
dislikeCount            514
favoriteCount             0
likeCount              1239
viewCount           1855996
like_dislike_r    0.7067884
views_share       0.9957712
msgs_share                1
likes_share       0.9465241
dislikes_share     0.980916
state                    MA
party                   REP
donations           4755654
Name: Gabriel Gomez, dtype: object
commentCount             81
dislikeCount              9
favoriteCount             0
likeCount               351
viewCount            291078
like_dislike_r        0.975
views_share       0.8347974
msgs_share        0.6585366
likes_share       0.9831933
dislikes_share            1
state                    ME
party                   DEM
donations           1333016
Name: Susan Collins, dtype: object
commentCount            110
dislikeCount            560
favoriteCount             0
likeCount              1660
viewCount           3077280
like_dislike_r    0.7477477
views_share       0.9883509
msgs_share        0.5789474
likes_share       0.7830189
dislikes_share    0.9180328
state                    MI
party                   DEM
donations           6994603
Name: Terri Land, dtype: object
commentCount               141
dislikeCount                71
favoriteCount                0
likeCount                  208
viewCount               260800
like_dislike_r       0.7455197
views_share           0.897435
msgs_share           0.7621622
likes_share          0.7536232
dislikes_share       0.6396396
state                       MN
party                      DFL
donations         1.512627e+07
Name: Al Franken, dtype: object
commentCount             74
dislikeCount             10
favoriteCount             0
likeCount                35
viewCount             15199
like_dislike_r    0.7777778
views_share       0.9559721
msgs_share                1
likes_share       0.8974359
dislikes_share            1
state                    MS
party                   REP
donations           2727209
Name: Thad Cochran, dtype: object
commentCount            253
dislikeCount            194
favoriteCount             0
likeCount               586
viewCount            797943
like_dislike_r    0.7512821
views_share         0.95031
msgs_share        0.4960784
likes_share       0.7163814
dislikes_share    0.6198083
state                    NC
party                   REP
donations           4764110
Name: Thom Tillis, dtype: object
commentCount            708
dislikeCount             92
favoriteCount             0
likeCount              2456
viewCount            388637
like_dislike_r    0.9638932
views_share       0.9169557
msgs_share        0.9645777
likes_share         0.96731
dislikes_share        0.736
state                    NH
party                   REP
donations           3686708
Name: Scott Brown, dtype: object
commentCount               355
dislikeCount                60
favoriteCount                0
likeCount                  894
viewCount               130143
like_dislike_r       0.9371069
views_share          0.9985269
msgs_share            0.997191
likes_share                  1
dislikes_share               1
state                       NJ
party                      DEM
donations         1.616787e+07
Name: Cory Booker, dtype: object
commentCount            821
dislikeCount             71
favoriteCount             0
likeCount               256
viewCount           1949518
like_dislike_r    0.7828746
views_share       0.9879091
msgs_share        0.5397765
likes_share        0.279476
dislikes_share    0.7029703
state                    NM
party                   DEM
donations           5050539
Name: Allen Weh, dtype: object
commentCount           1905
dislikeCount            406
favoriteCount             0
likeCount              2094
viewCount            170226
like_dislike_r       0.8376
views_share        0.981288
msgs_share        0.9684799
likes_share       0.9840226
dislikes_share    0.9666667
state                    OK
party                   REP
donations           2811701
Name: James Inhofe, dtype: object
commentCount            115
dislikeCount            295
favoriteCount             0
likeCount               660
viewCount             95280
like_dislike_r    0.6910995
views_share        0.702572
msgs_share        0.4811715
likes_share       0.6626506
dislikes_share    0.8912387
state                    OR
party                   REP
donations           2049732
Name: Monica Wehby, dtype: object
commentCount             90
dislikeCount              4
favoriteCount             0
likeCount               128
viewCount             19412
like_dislike_r     0.969697
views_share       0.9107629
msgs_share        0.8653846
likes_share        0.969697
dislikes_share            1
state                    RI
party                   DEM
donations           2833803
Name: Jack Reed, dtype: object
commentCount           1089
dislikeCount            135
favoriteCount             0
likeCount              1412
viewCount            119855
like_dislike_r    0.9127343
views_share       0.8915859
msgs_share        0.8409266
likes_share       0.8114943
dislikes_share            1
state                    SC
party                   REP
donations           6788544
Name: Lindsey Graham, dtype: object
commentCount             34
dislikeCount              4
favoriteCount             0
likeCount                48
viewCount              6291
like_dislike_r    0.9230769
views_share       0.5905379
msgs_share        0.1619048
likes_share       0.5783133
dislikes_share     0.173913
state                    TN
party                   REP
donations           1812250
Name: Lamar Alexander, dtype: object
commentCount            166
dislikeCount             20
favoriteCount             0
likeCount               260
viewCount             21110
like_dislike_r    0.9285714
views_share       0.9959426
msgs_share        0.9822485
likes_share       0.9961686
dislikes_share    0.9090909
state                    TX
party                   DEM
donations           9673572
Name: John Cornyn, dtype: object
commentCount            210
dislikeCount            110
favoriteCount             0
likeCount               690
viewCount            159430
like_dislike_r       0.8625
views_share       0.9692383
msgs_share        0.8606557
likes_share       0.8734177
dislikes_share    0.8730159
state                    WV
party                   REP
donations           5482547
Name: Natalie Tennant, dtype: object
commentCount             30
dislikeCount              0
favoriteCount             0
likeCount                20
viewCount              3900
like_dislike_r            1
views_share       0.7228916
msgs_share        0.5454545
likes_share       0.2222222
dislikes_share          inf
state                    WY
party                   REP
donations           3016825
Name: Elizabeth Cheney, dtype: object



In [46]:

    
len(sentate_2014["state"].unique())









    Out[46]:





27

Check 2012 Senate Elections



In [ ]:



In [35]:

    
def get_state_data(candidates):
    data_set = get_2012_data(candidates)
    t_ds = pd.pivot_table(data_set, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
                   aggfunc='sum', rows="candidate_name")
    t_ds["like_dislike_r"] = t_ds["likeCount"] / (t_ds["dislikeCount"] + t_ds["likeCount"])
    t_ds["views_share"] = t_ds["viewCount"] / t_ds["viewCount"].sum()
    t_ds["msgs_share"] = t_ds["commentCount"] / t_ds["commentCount"].sum()
    t_ds["likes_share"] = t_ds["likeCount"] / t_ds["likeCount"].sum()
    t_ds["dislikes_share"] = t_ds["dislikeCount"] / t_ds["dislikeCount"].sum()
    # Sentemate Analysis of the title
    t_ds["sentiment"] = pd.Series()
    for cand in candidates:
        t_ds["sentiment"][cand] = np.mean(
                                    [TextBlob(title).polarity for title in data_set[data_set["candidate_name"]==cand]["title"]]
                                         )
    
    print t_ds
    return t_ds



In [36]:

    
senate_2012 = pd.read_csv("data/2012_senate_results.csv")
senate_2012["Full Name"] = senate_2012["First Name"] + " "  + senate_2012["Last Name"]
senate_2012









    Out[36]:




<class 'pandas.core.frame.DataFrame'>
Int64Index: 126 entries, 0 to 125
Data columns (total 9 columns):
State Postal    126  non-null values
County Name     126  non-null values
Party           126  non-null values
First Name      126  non-null values
Last Name       126  non-null values
Incumbent       126  non-null values
Vote Count      126  non-null values
Winner          33  non-null values
Full Name       126  non-null values
dtypes: int64(2), object(7)



In [37]:

    
senate_2012["commentCount"] = pd.Series()
senate_2012["dislikeCount"] = pd.Series()
senate_2012["favoriteCount"] = pd.Series()
senate_2012["likeCount"] = pd.Series()
senate_2012["viewCount"] = pd.Series()
senate_2012["like_dislike_r"] = pd.Series()
senate_2012["views_share"] = pd.Series()
senate_2012["msgs_share"] = pd.Series()
senate_2012["likes_share"] = pd.Series()
senate_2012["dislikes_share"] = pd.Series()
senate_2012["sentiment"] = pd.Series()

for state in np.unique(senate_2012["State Postal"]):
    print state + ":"
    cands = senate_2012[senate_2012["State Postal"] == state]
    top_cands = cands.sort("Vote Count",ascending=False)[:2]
    #print top_cands
    try:
        youtube_stats = get_state_data(top_cands["Full Name"].values)
        #print youtube_stats
        # Store Data Back

        for item in youtube_stats.iterrows():
            cand = item[0]
            stats = item[1]
            index = int(senate_2012[senate_2012["Full Name"] == cand].index)
            senate_2012["commentCount"][index] = stats["commentCount"]
            senate_2012["dislikeCount"][index] = stats["dislikeCount"]
            senate_2012["favoriteCount"][index] = stats["favoriteCount"]
            senate_2012["likeCount"][index] = stats["likeCount"]
            senate_2012["viewCount"][index] = stats["viewCount"]
            senate_2012["like_dislike_r"][index] = stats["like_dislike_r"]
            senate_2012["views_share"][index] = stats["views_share"]
            senate_2012["msgs_share"][index] = stats["msgs_share"]
            senate_2012["likes_share"][index] = stats["likes_share"]
            senate_2012["dislikes_share"][index] = stats["dislikes_share"]
            senate_2012["sentiment"][index] = stats["sentiment"]
    except:
        pass









    



AZ:
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Jeff Flake                542          1240              0       2927    1829090        0.702424      0.58453   
Richard Carmona           513          2112              0       4590    1300075        0.684870      0.41547   

                 msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                       
Jeff Flake         0.513744     0.389384        0.369928   0.016247  
Richard Carmona    0.486256     0.610616        0.630072   0.044413  
CA:
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Dianne Feinstein         12130          1690              0      16760    3165370        0.908401       0.7182   
Elizabeth Emken           4492           437              0       6777    1241994        0.939423       0.2818   

                  msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                        
Dianne Feinstein    0.729756      0.71207        0.794546  -0.000463  
Elizabeth Emken     0.270244      0.28793        0.205454  -0.006897  
CT:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Chris Murphy             233            88              0        447     159171        0.835514     0.371103   
Linda McMahon           3962           418              0       4062     269742        0.906696     0.628897   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Chris Murphy      0.055542     0.099135        0.173913   0.004893  
Linda McMahon     0.944458     0.900865        0.826087   0.029649  
DE:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Kevin Wade                60            10              0         70      11400           0.875     0.276123   
Thomas Carper             28            10              0         70      29886           0.875     0.723877   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Kevin Wade        0.681818          0.5             0.5  -0.031250  
Thomas Carper     0.318182          0.5             0.5   0.018583  
FL:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bill Nelson              376            59              0        937     511466        0.940763     0.865892   
Connie Mack              164            67              0        249      79215        0.787975     0.134108   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Bill Nelson       0.696296     0.790051        0.468254   0.021819  
Connie Mack       0.303704     0.209949        0.531746   0.015071  
HI:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Linda Lingle             343           451              0        672     480724        0.598397     0.476687   
Mazie Hirono             367           577              0        924     527744        0.615590     0.523313   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Linda Lingle      0.483099     0.421053        0.438716   0.023237  
Mazie Hirono      0.516901     0.578947        0.561284   0.065901  
IN:
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Joe Donnelly              3011           931              0       1203    2646352        0.563730     0.756489   
Richard Mourdock          7869          1467              0       8777     851850        0.856794     0.243511   

                  msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                        
Joe Donnelly        0.276746     0.120541         0.38824  -0.009662  
Richard Mourdock    0.723254     0.879459         0.61176   0.043522  
MA:
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Elizabeth Warren         10007          2208              0      15744    2458138        0.877005     0.718429   
Scott Brown               5226          1259              0       6490     963410        0.837527     0.281571   

                  msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                        
Elizabeth Warren    0.656929     0.708105        0.636862   0.014748  
Scott Brown         0.343071     0.291895        0.363138   0.019217  
MD:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Ben Cardin               301            90              0        386      61822        0.810924      0.82694   
Daniel Bongino           162            10              0        162      12938        0.941860      0.17306   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Ben Cardin        0.650108      0.70438             0.9   0.008083  
Daniel Bongino    0.349892      0.29562             0.1  -0.018750  
ME:
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Angus King                 91           133              0        385      39436        0.743243     0.911689   
Charles Summers             0            10              0         40       3820        0.800000     0.088311   

                 msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                       
Angus King                1     0.905882         0.93007   0.053315  
Charles Summers           0     0.094118         0.06993   0.016667  
MI:
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Debbie Stabenow            63            83              0        595     428223        0.877581     0.870664   
Pete Hoekstra              57           187              0        398      63612        0.680342     0.129336   

                 msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                       
Debbie Stabenow       0.525     0.599194        0.307407   0.062909  
Pete Hoekstra         0.475     0.400806        0.692593   0.121008  
MN:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Amy Klobuchar            264           242              0        430      97354        0.639881     0.659651   
Kurt Bills               300           100              0        460      50230        0.821429     0.340349   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Amy Klobuchar     0.468085     0.483146        0.707602   0.030468  
Kurt Bills        0.531915     0.516854        0.292398   0.160000  
MO:
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Claire McCaskill           525           438              0        765     141427        0.635910     0.036271   
Todd Akin                44285          5817              0      66830    3757741        0.919928     0.963729   

                  msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                        
Claire McCaskill    0.011716     0.011317        0.070024   0.009500  
Todd Akin           0.988284     0.988683        0.929976  -0.016794  
MS:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Albert Gore              370             0              0        260      75470        1.000000      0.70684   
Roger Wicker             120            93              0        203      31301        0.685811      0.29316   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Albert Gore       0.755102     0.561555               0  -0.050000  
Roger Wicker      0.244898     0.438445               1   0.029821  
MT:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Denny Rehberg           4500           351              0      10651     390033        0.968097     0.807661   
Jon Tester               480            72              0       1172      92884        0.942122     0.192339   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Denny Rehberg     0.903614     0.900871        0.829787   0.003883  
Jon Tester        0.096386     0.099129        0.170213   0.023900  
ND:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Heidi Heitkamp           315           289              0        726     773197        0.715271     0.574998   
Rick Berg                331           200              0        573     571499        0.741268     0.425002   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Heidi Heitkamp    0.487616     0.558891        0.591002   0.007323  
Rick Berg         0.512384     0.441109        0.408998   0.033317  
NE:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bob Kerrey               328           179              0       1526     526859        0.895015     0.616558   
Deb Fischer              441           272              0       2076     327657        0.884157     0.383442   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Bob Kerrey        0.426528     0.423654        0.396896   0.015867  
Deb Fischer       0.573472     0.576346        0.603104   0.011278  
NJ:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bob Menendez             601           232              0        819     132137        0.779258     0.777057   
Joe Kyrillos              87           189              0        198      37911        0.511628     0.222943   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Bob Menendez      0.873547      0.80531        0.551069  -0.017692  
Joe Kyrillos      0.126453      0.19469        0.448931   0.083939  
NM:
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Heather Wilson            170           330              0        460     112560        0.582278     0.116505   
Martin Heinrich           380           790              0       2580     853580        0.765579     0.883495   

                 msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                       
Heather Wilson     0.309091     0.151316        0.294643   0.064444  
Martin Heinrich    0.690909     0.848684        0.705357   0.058437  
NV:
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Dean Heller               286           748              0       1077    1014221        0.590137     0.429432   
Shelley Berkley           434           402              0        884    1347552        0.687403     0.570568   

                 msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                       
Dean Heller        0.397222      0.54921        0.650435  -0.003463  
Shelley Berkley    0.602778      0.45079        0.349565   0.005413  
NY:
                    commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                     
Kirsten Gillibrand           684           167              0       1452     399903        0.896850     0.638157   
Wendy Long                   494            83              0        740     226750        0.899149     0.361843   

                    msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                          
Kirsten Gillibrand    0.580645     0.662409           0.668   0.030444  
Wendy Long            0.419355     0.337591           0.332   0.074461  
OH:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Josh Mandel              484           311              0        423     163496        0.576294     0.317986   
Sherrod Brown            292            70              0        331     350665        0.825436     0.682014   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Josh Mandel       0.623711     0.561008        0.816273   0.002392  
Sherrod Brown     0.376289     0.438992        0.183727   0.034265  
PA:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bob Casey                117           139              0        316      57281        0.694505     0.203635   
Tom Smith                793           209              0       1648     224012        0.887453     0.796365   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Bob Casey         0.128571     0.160896        0.399425   0.014105  
Tom Smith         0.871429     0.839104        0.600575   0.048067  
RI:
                    commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                     
Barry Hinckley                12            12              0        138      21111        0.920000     0.404712   
Sheldon Whitehouse            10            10              0        134      31052        0.930556     0.595288   

                    msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                          
Barry Hinckley        0.545455     0.507353        0.545455   0.138462  
Sheldon Whitehouse    0.454545     0.492647        0.454545   0.033625  
TN:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bob Corker               155            20              0        162      17499        0.890110     0.258238   
Mark Clayton             230            50              0        230      50264        0.821429     0.741762   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Bob Corker        0.402597     0.413265        0.285714   0.144643  
Mark Clayton      0.597403     0.586735        0.714286   0.100538  
TX:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Paul Sadler              840           257              0       1937     250148        0.882862     0.462666   
Ted Cruz                1676           255              0       2410     290519        0.904315     0.537334   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Paul Sadler       0.333863     0.445595        0.501953   0.015821  
Ted Cruz          0.666137     0.554405        0.498047   0.009659  
UT:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Orrin Hatch              495           336              0        284      38701        0.458065     0.511722   
Scott Howell              19            13              0        175      36928        0.930851     0.488278   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Orrin Hatch       0.963035     0.618736        0.962751   0.084909  
Scott Howell      0.036965     0.381264        0.037249   0.050492  
VA:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
George Allen             324           443              0        500     206940        0.530223     0.718385   
Timothy Kaine            174            90              0        272      81123        0.751381     0.281615   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
George Allen      0.650602     0.647668        0.831144   0.023067  
Timothy Kaine     0.349398     0.352332        0.168856   0.022738  
VT:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bernie Sanders          2131           129              0       4878     223002        0.974236     0.992571   
John MacGovern             6             0              0          2       1669        1.000000     0.007429   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Bernie Sanders    0.997192      0.99959               1   0.011692  
John MacGovern    0.002808      0.00041               0   0.000000  
WA:
                     commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                      
Maria Cantwell                234            52              0        244      62934        0.824324     0.736725   
Michael Baumgartner           112            70              0        137      22490        0.661836     0.263275   

                     msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                           
Maria Cantwell         0.676301      0.64042         0.42623   0.107853  
Michael Baumgartner    0.323699      0.35958         0.57377   0.031234  
WI:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Tammy Baldwin            393           169              0        944     488400        0.848158     0.602401   
Tommy Thompson          1494           550              0       2332     322355        0.809160     0.397599   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Tammy Baldwin     0.208267     0.288156        0.235049   0.022898  
Tommy Thompson    0.791733     0.711844        0.764951   0.010391  
WV:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Joe Manchin              675           545              0       3005     604285        0.846479     0.964792   
John Raese                74            68              0        118      22052        0.634409     0.035208   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Joe Manchin       0.901202     0.962216         0.88907   0.006897  
John Raese        0.098798     0.037784         0.11093  -0.057143  
WY:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
John Barrasso            630           290              0        380      74030        0.567164     0.996943   
Tim Chesnut                4             2              0          2        227        0.500000     0.003057   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
John Barrasso     0.993691     0.994764        0.993151       0.09  
Tim Chesnut       0.006309     0.005236        0.006849       0.00



In [38]:

    
cands_with_stats = senate_2012[~senate_2012["viewCount"].isnull()]
cands_with_stats["VotesShare"] = cands_with_stats[["Vote Count", "State Postal"]].apply(lambda x:x[0]/senate_2012[senate_2012["State Postal"]==x[1]]["Vote Count"].sum(), axis=1)



In [39]:

    
x_col = "views_share"
y_col = "VotesShare"

plt.figure(figsize=(15,10))
color_dict = {"Dem": "b", "GOP": "r", "Ind":"g", "NPA": "orange"}
shape_dict = {"X": "*", "nan": "."}

wl_dp = [len(cands_with_stats[(cands_with_stats[x_col]>=0.5) &
                          (cands_with_stats["Winner"]=="X")]),
                    len(cands_with_stats[(cands_with_stats[x_col]>=0.5)])]
wl_dm = [len(cands_with_stats[(cands_with_stats[x_col]<0.5) &
                          (cands_with_stats["Winner"]=="X")]),
                    len(cands_with_stats[(cands_with_stats[x_col]<0.5)])]

wl_50p = "Winning Ratio %s/%s ($%0.1f \%%$)" % (wl_dp[0], wl_dp[1], wl_dp[0]/wl_dp[1]*100)
wl_50m = "Winning Ratio %s/%s ($%0.1f \%%$)" % (wl_dm[0], wl_dm[1], wl_dm[0]/wl_dm[1]*100)

for cand in cands_with_stats.iterrows():
    stats = cand[1]
    x = stats[x_col]
    y = stats[y_col]
    c = color_dict[stats["Party"]]
    m = shape_dict[str(stats["Winner"])]
    plt.scatter(x, y, c=c, marker=m, s=500, alpha=0.5)
    if stats[x_col] > 0.9:
        plt.annotate(stats["Full Name"],xytext=(8,20), xy=(x,y),
                     textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))

plt.xlabel("Youtube " + x_col + " Between Competing Candidates in a State Race")
plt.ylabel("Actual " + y_col)
plt.vlines(.5, ymin=0, ymax=1)

plt.annotate(s=wl_50p, xy=(0.7, 1))
plt.annotate(s=wl_50m, xy=(0.2, 1))
plt.title("Youtube Video Views for Candidate from 2012-08-04 to 2012-11-04 and Actual Votes")
plt.annotate("Start Represent Winning Candidates\nCircles Represent Loosing Candidate", xy=(0.03, 0.85))
plt.annotate("Red: GOP\nBlue: Dem\nGreen: Ind\nYellow: NPA", xy=(0.03, 0.7))
axis("tight")
plt.box(on="off")
plt.show()



In [40]:

    
cands_with_stats[cands_with_stats["State Postal"]=="MO"]









    Out[40]:






  
    
      
      State Postal
      County Name
      Party
      First Name
      Last Name
      Incumbent
      Vote Count
      Winner
      Full Name
      commentCount
      dislikeCount
      favoriteCount
      likeCount
      viewCount
      like_dislike_r
      views_share
      msgs_share
      likes_share
      dislikes_share
      sentiment
      VotesShare
    
  
  
    
      13
       MO
       Missouri
       Dem
       Claire
       McCaskill
       1
       1484683
         X
       Claire McCaskill
         525
        438
       0
         765
        141427
       0.635910
       0.036271
       0.011716
       0.011317
       0.070024
       0.009500
       0.547173
    
    
      46
       MO
       Missouri
       GOP
         Todd
            Akin
       0
       1063698
       NaN
              Todd Akin
       44285
       5817
       0
       66830
       3757741
       0.919928
       0.963729
       0.988284
       0.988683
       0.929976
      -0.016794
       0.392021

	commentCount	dislikeCount	favoriteCount	likeCount	viewCount
candidate_name
Cory Gardner	304	167	0	437	234669
Mark Udall	195	470	0	450	144744

	commentCount	dislikeCount	favoriteCount	likeCount	viewCount
candidate_name
George Allen	297	352	0	475	203297
Tim Kaine	174	97	0	553	248367

	commentCount	dislikeCount	favoriteCount	likeCount	viewCount	like_dislike_r	views_share	msgs_share	likes_share	dislikes_share	state	party	donations
candidate_name
Dan Sullivan	360	94	0	623	165715	0.868898	0.613568	0.814480	0.774876	0.494737	AK	DEM	6340422.00
Jeff Sessions	142	10	0	36	5114	0.782609	1.000000	1.000000	1.000000	1.000000	AL	REP	1115688.00
Thomas Cotton	100	91	0	195	797919	0.681818	0.814496	0.434783	0.537190	0.722222	AR	REP	7097224.06
Cory Gardner	316	167	0	450	234223	0.729335	0.628876	0.633267	0.530660	0.268058	CO	DEM	10420571.00
Christopher Coons	190	20	0	110	19950	0.846154	0.559278	0.627063	0.379310	0.400000	DE	DEM	4173447.00
John Kingston	37	2	0	17	717	0.894737	0.540724	0.451220	0.447368	1.000000	GA	DEM	9211931.00
Mark Jacobs	7334	1052	0	66064	1464016	0.984326	0.943058	0.959571	0.995765	0.932624	IA	REP	4810813.00
Pat Roberts	256	83	0	374	97678	0.818381	0.943867	0.733524	0.869767	0.768519	KS	REP	1068018.00
Mitch Mcconnell	2197	272	0	4760	893077	0.945946	0.569208	0.632230	0.640387	0.428346	KY	DEM	11353760.00
Mary Landrieu	745	84	0	2015	395161	0.959981	0.986888	0.937107	0.971084	1.000000	LA	DEM	10190144.00
Gabriel Gomez	899	514	0	1239	1855996	0.706788	0.995771	1.000000	0.946524	0.980916	MA	REP	4755654.00
Susan Collins	81	9	0	351	291078	0.975000	0.834797	0.658537	0.983193	1.000000	ME	DEM	1333016.00
Terri Land	110	560	0	1660	3077280	0.747748	0.988351	0.578947	0.783019	0.918033	MI	DEM	6994603.00
Al Franken	141	71	0	208	260800	0.745520	0.897435	0.762162	0.753623	0.639640	MN	DFL	15126268.00
Thad Cochran	74	10	0	35	15199	0.777778	0.955972	1.000000	0.897436	1.000000	MS	REP	2727209.00
Thom Tillis	253	194	0	586	797943	0.751282	0.950310	0.496078	0.716381	0.619808	NC	REP	4764110.00
Scott Brown	708	92	0	2456	388637	0.963893	0.916956	0.964578	0.967310	0.736000	NH	REP	3686708.00
Cory Booker	355	60	0	894	130143	0.937107	0.998527	0.997191	1.000000	1.000000	NJ	DEM	16167874.00
Allen Weh	821	71	0	256	1949518	0.782875	0.987909	0.539776	0.279476	0.702970	NM	DEM	5050539.00
James Inhofe	1905	406	0	2094	170226	0.837600	0.981288	0.968480	0.984023	0.966667	OK	REP	2811701.00
Monica Wehby	115	295	0	660	95280	0.691099	0.702572	0.481172	0.662651	0.891239	OR	REP	2049732.00
Jack Reed	90	4	0	128	19412	0.969697	0.910763	0.865385	0.969697	1.000000	RI	DEM	2833802.97
Lindsey Graham	1089	135	0	1412	119855	0.912734	0.891586	0.840927	0.811494	1.000000	SC	REP	6788544.00
Lamar Alexander	34	4	0	48	6291	0.923077	0.590538	0.161905	0.578313	0.173913	TN	REP	1812250.00
John Cornyn	166	20	0	260	21110	0.928571	0.995943	0.982249	0.996169	0.909091	TX	DEM	9673572.00
Natalie Tennant	210	110	0	690	159430	0.862500	0.969238	0.860656	0.873418	0.873016	WV	REP	5482547.00
Elizabeth Cheney	30	0	0	20	3900	1.000000	0.722892	0.545455	0.222222	inf	WY	REP	3016825.00

	State Postal	County Name	Party	First Name	Last Name	Incumbent	Vote Count	Winner	Full Name	commentCount	dislikeCount	favoriteCount	likeCount	viewCount	like_dislike_r	views_share	msgs_share	likes_share	dislikes_share	sentiment	VotesShare
13	MO	Missouri	Dem	Claire	McCaskill	1	1484683	X	Claire McCaskill	525	438	0	765	141427	0.635910	0.036271	0.011716	0.011317	0.070024	0.009500	0.547173
46	MO	Missouri	GOP	Todd	Akin	0	1063698	NaN	Todd Akin	44285	5817	0	66830	3757741	0.919928	0.963729	0.988284	0.988683	0.929976	-0.016794	0.392021

Tutorial Brief

Google APIs

Import Libraries

Searching YouTube Using youtube.search.list

Documentation:

HTTPS Request:

Parameters:

The important parameters:

Preparing The HTTP Request

Fetch Results for a Single Page

YouTube Video Meta Data Using youtube.video.list

Documentation:

HTTPS Request:

Parameters:

Preparing The HTTP Request

Process Data Range

Get Data for Two Candidates

Analyzing Colorado Senate Race for 2014

How Predective Was It in 2012?

Virginia Senate - Allen vs. Kaine

Nevada Senate - Heller vs. Berkley

Current Senate $113^{th}$

Get Current Senate Data

Store Data In Pandas Data Frame

Control By Party

Who is up for Re-election?

Looking at the other classes

Forecasting Results of Senate Elections 2014

Get Competitors

Fetch Data

Process the data into an XML Tree

Store data into Pandas Data Frame

Retrieive YouTube Data for All Candidates

Check 2012 Senate Elections

Searching YouTube Using `youtube.search.list`